Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies
نویسندگان
چکیده
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.
منابع مشابه
I-40: Male Genome Programming, Infertility and Cancer
Background: During male germ cells differentiation, genomewide re-organizations and highly specific programming of the male genome occur. These changes not only include the large-scale meiotic shuffling of genes, taking place in spermatocytes, but also a complete “re-packaging” of the male genome in post meiotic cells, leading to a highly compacted nucleo-protamine structure in the mature sperm...
متن کاملThe Pattern of Linkage Disequilibrium in Livestock Genome
Linkage disequilibrium (LD) is bases of genomic selection, genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping, parentage testing and whole genome association studies. The Particular alleles at closed loci have a tendency to be co-inherited. In linked loci this pattern leads to association between alleles in population which is known as LD. Two metr...
متن کاملGenome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review
Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...
متن کاملP83: Role of Neuregulin 3 Genes Expression on Attention Deficits in Schizophrenia
Genetic epidemiological studies strongly suggest that additive and interactive genes, each with small effects, mediate the genetic vulnerability for schizophrenia. With the human genome working draft at hand, candidate gene (and ultimately large-scale genome-wide) association studies are gaining renewed interest in the effort to unravel the complex genetics of schizophrenia. Linkage and fine ma...
متن کاملIncorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies
Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 6 شماره
صفحات -
تاریخ انتشار 2011